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x I reported on a word-frequency project car- 
ried out at the University of Adelaide. The 
main reason for undertaking this work was the 
fact that almost all existing word lists for Ger- 
man were based, directly or indirectly, on F. W. 
Kaeding’s Haufigkeiiswdrterbuch 1 which is un- 
suitable for pedagogic purposes. Since then a 
great deal of original work has been done in the 
field of lexicology 2 , most of it with the help of 
computers. At Adelaide we have just com- 
pleted the first stage of a computer-based word 
count for German. This article reports on its 
methods and the first tentative findings. 

Our existing list of the thousand most useful 
words 3 has served us well. We use it as a basis 
for the evaluation of the vocabulary content of 
elementary readers and it has been helpful when 
setting examinations. More recently, with the 
advent of language laboratories, the need for 
suitable materials has greatly increased, and 
most of these are produced locally. Here too 
the list has proved itself a useful aid. It is 
generally agreed that when the student is prac- 
tising certain structures the whole of his atten- 
tion should be focussed on the structure itself. 
The inclusion of unfamiliar vocabulary would 
create problems of meaning. We feel that we 
ought not to distract from the morphological 
and syntactical aspects by creating extra, dif- 
ficulties. Vocabulary building must of course 
be carried on side by side with the structural 
work, but this can be done just as effectively 
outside the laboratory. 

Having used our list regularly over a number 
of years we are now not altogether confident 
that the last few hundred words really have a 
claim to being included. It is quite possible that 
another word count conducted along exactly the 
same lines would produce a number of changes 
m the last quarter of the list. Words at present 
included might not qualify for inclusion again 
and be replaced by other words at present not 
listed. Our original sample was only 50,000 
running words and in the lower frequencies the 



error-margin is bound to be large. It has been 
argued 4 that one would need a sample at least 
ten times as large to obtain a fully dependable 
basic vocabulary. This does not mean, of 
course, that the last quarter of the Adelaide list 
is altogether useless, since it contains words well 
worth teaching. However, we cannot say with 
any degree of certainty that the word listed in 
position 950 is more frequently used than that 
in position 990. In a word list based on another 
sample these two words might well occur in the 
reverse order. 

We are no longer prepared to count large 
samples by hand and yet we have felt the need 
for greater precision. Moreover, we need a 
word list that goes beyond 1,000 entries, as the 
bulk of our work is concerned with the post- 
elcmeniary stages. A 1,000-word basic list can 
easily achieve a text-coverage of 85-90 per cent 
of elementary readers. But how much needs to 
be added to this list to deal effectively with the 
intermediate and advanced stages of our work? 

The following table gives an answer to this 
question. Stejnfeldt 5 , a Russian scholar, found 
that the expansion of a vocabulary list from 
1,300 to 2,000 basic words added very little 
text coverage. 



Sample taken from 


Text-coverage obtained with 


1,300 words 


2,000 words 


Russian prose 


74% 


78% 


drama 


75% 


79% 


newspapers 


73% 


78% 


poetry 


65% 


68% 



This means that a student of Russian who 
increases his vocabulary from 1,300 to 2,000 
words will increase his text recognition by at 
most 5 words per hundred. He will still have 
to use his dictionary twenty-odd times (easily 
recognized cognates excepted) per hundred 
words. It is clear that after mastering a basic 
vocabulary students will have to learn vast 
amounts of additional words to achieve real 
reading fluency. 



*Mr. Siliakus is senior lecturer in German and director of the language laboratory at the University of 
Adelaide. 
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Our own findings confirm this in a startling 
way. One can cover 20% of a German text 
with a knowledge of only ten words. Twenty 
words cover 28%, fifty words 40%, one 
hundred words 46%. If these figures are 
plotted on graph paper one sees that the curve 
flattens out very quickly. Hence, the further 
one goes along the curve, increasingly more 
vocabulary has to be absorbed in order to in- 
crease the text-coverage. 

Since we were not prepared to analyse a 
large text by the old-fashioned method, we used 
the university’s computer 6 . We had over 
100,000 words typed out on paper tape and 
transferred on to magnetic tape. We chose texts 
from musicological writings. For some time, 
several colleagues have been unhappy about 
their students’ lack of reading ability. We in- 
tend to help them by providing specialized word 
lists for music, history, geography, psychology 
and literary criticism. In this way we hope to 
achieve two things at the same time: firstly, 
word-lists for several Arts disciplines, and 
secondly a large sample of at least half a mil- 
lion words, from which we can compile a 
dependable general list of up to 2,000 words. 



Present indications are that a list of 2,000 
general words and a special subject list of say 
500 specialized terms would give a very good 
text-coverage, probably somewhere near 90% . 

In conclusion I should like to describe very 
briefly the actual method employed. As men- 
tioned above, our first stage was the typing of 
a text of over 100,000 running words. Then 
we compiled two dictionaries which were also 
transferred on to tape. The first consisted of 
our 1,000 most useful words together with all 
their morphological possibilities. Our list gives 
stems only. However, in the text a verb like 
geben may occur as gebe, gibt, gab, gegeben 
etc. We used a code of indices to instruct the 
machine to charge all those possible occurrences 
to the stem word, but to keep a tally of the 
individual frequencies. It was necessary to 
make similar arrangements for adjectives and 
nouns. This first store finally came to about 
3,300 entries. 

Then we made a list of music terms in the 
same way. We abstracted some 500 words 
from articles and by listing the instruments of 
the orchestra etc. These lists were also trans- 
ferred on to magnetic tape. 
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The computer then produced first of all a 
list of all the words in the sample that were 
covered by our basic 1,000 and their forms. 
This accounted for 64% of the sample. Then 
a list of words was printed that were covered 
by our music dictionary; the coverage was 

about 5%. • • i 

This meant that just over 30% of the original 

text had not been covered by our two dic- 
tionaries. Of course we were anxious to have 
a look at these rejects. We found that the bulk 
of them consisted of very low-frequency words 
(1-3 occurrences in 100,000 words), but that 
there were masses of high-frequency proper 
nouns (Beethoven, Beethovens), as well as 
high-frequency cognates (Komposition, Instru- 
ment, Text). A preliminary calculation points 
to a total coverage of about 85% . 

With these results in front of us all sorts of 
exciting projects can be undertaken. Our first 
priority is a manageable word-list for music 
students. The general frequency list will have 
to wait until another four batches are analysed, 
and this will take time and money. However, 
preparations are under way for the analysis of 
the vocabulary content of literary criticism. 
Once this has been done we shall know much 
more about the sort of vocabulary a second- 
year university student should know if he is 
to write an essay on the lyrical poems of the 
Romantic period. It is with such concrete and 
practical issues in mind that we have embarked 
on this, our second stage of vocabulary re- 
search. 

Notes ... 

IF. W. Kaeding, Hdi.figkeitsworterbuch der deutschen 
Sprache, Berlin, 1898. 

2 Two of the best-known institutions where computer- 
based research in lexicology is carried out are 
Centro per l’ Automazione dell’Analisi Letter aria 
in Gallarate near Milan, where under the direction 
of Pater Roberto Busa S.J. concordances of 



medieval philosophers ar° ng compiled; and 
the Centre d' Etude du Me Franqais m 

Besangon, headed by Professor B. Quemada. It was 
at Besancon that the most recent word count ot 
Basic Spoken German was analysed. The findings 
have been published by Prentice Hall, New Jersey, 
1964- J. Alan Pfeffer: Grunddeutsch. Basic (Spoken) 
German Word List. For a general account 9 f 
modern work in vocabulary research, see article in 
Beitrdge zur Sprachkunde und Informationsverar- 
* .. .. tt-£ A t « & Vtr r>r T? Ha Tollenaere. 
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Whe Adelaide List of the 1,000 most useful words in 
German ; (ed. H. J. Siliakus); Department of Ger- 
man, University of Adelaide, October 1964. 
4Rolf-Dietrich Keil, Einheithche Methoden in der 
Lexikometrie, IRAL III/2, 1965, p. 102 ff. 

•’Table based on IRAL II/4, 1964, pp. 245, 246. 

6 We wish to thank the Computing Centre of the 
University of Adelaide and especially Mr. K. t. 
Lee, a Ph.D. student from Malaysia, who designed 
the computer programme for us. 
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